Inferring the Gibbs state of a small quantum system 
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Gibbs states are familiar from statistical mechanics, yet their use is not limited to that domain. 
For instance, they also feature in the maximum entropy reconstruction of quantum states from 
incomplete measurement data. Outside the macroscopic realm, however, estimating a Gibbs state 
is a nontrivial inference task, due to two complicating factors: the proper set of relevant observables 
might not be evident a priori; and whenever data are gathered from a small sample only, the best 
estimate for the Lagrange parameters is invariably affected by the experimenter's prior bias. I 
show how the two issues can be tackled with the help of Bayesian model selection and Bayesian 
interpolation, respectively, and illustrate the use of these Bayesian techniques with a number of 
simple examples. 
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I. INTRODUCTION 

Quantum states are not accessible to direct observa- 
tion, and so do not constitute per se a physical reality. 
Rather, they provide a convenient mathematical sum- 
mary of an agent's expectations as to the outcomes of 
future experiments Such expectations are formed 
both on the basis of past measurement data and on the 
basis of any prior knowledge (say, about specific symme- 
tries) that the agent may have. In practice, the available 
experimental data are often far from perfect: measure- 
ment devices work with limited accuracy; sample sizes 
are finite; and the set of observables measured might not 
be informationally complete. Under such circumstances, 
a quantum state represents merely a model, and hence a 
hypothesis, which is subject to testing, debate, and mod- 
ification. The more complex the physical system under 
study, and the sketchier the available data, the more this 
model will be informed by the agent's prior knowledge. 

Prior knowledge may be of two types: (i) the expecta- 
tion, often based on symmetry considerations, that the 
quantum state has a certain parametric form; and (ii) 
given a parametric form (including free-form as a spe- 
cial case), a bias as to its parameter values. Making 
proper use of such prior knowledge can lead to signif- 
icant gains in the efficiency and accuracy of quantum- 
state tomography, i.e., the reconstruction of a quantum 
state from imperfect data. One recent example where 
prior knowledge about the parametric form has been ex- 
ploited to great advantage, is the polynomial scheme for 
reconstructing near matrix product states 0j. The sec- 
ond type of prior knowledge, on the other hand, has been 
used in recent Bayesian modifications to the conventional 
maximum likelihood tomography scheme 

One parametric form that occupies a special place in 
physics is that of a Gibbs state. Such a state maxi- 
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mizes the entropy, or more generally, minimizes the rel- 
ative entropy with respect to some reference state, un- 
der given constraints on some selected set of expecta- 
tion values. Gibbs states are familiar from statistical 
mechanics where, in both the classical 0, @] and the 
quantum @ case, the principle of maximum entropy has 
long been recognised as the appropriate prescription for 
constructing the macrostate. The common justification 
of this principle rests on a number of assumptions: (i) 
the system under consideration may be viewed as one 
constituent of a larger ensemble of identically prepared 
systems, whose size approaches infinity (the "thermody- 
namic limit" ) ; (ii) pertaining to the global state of this 
fictitious infinite ensemble, there are constraints in the 
form of sharp values for the totals of certain observables 
deemed relevant; and (iii) there is clarity as to which ob- 
servables are relevant. While the first two assumptions 
are of a purely statistical nature, the last implicitly in- 
vokes the system's dynamics. In equilibrium statistical 
mechanics, the relevant observables are the system's con- 
stants of the motion; whereas in nonequilibrium trans- 
port equations, they typically comprise the slowly vary- 
ing degrees of freedom [9( . 

Gibbs states play an important role even in realms 
where the above assumptions are not justified. For 
instance, hadronization in e + e~ collisions is described 
with thermal distributions, even though the number of 
hadrons produced in one collision is hardly more than a 
handful [lOj. Another example is the extension of ther- 
modynamics to nanoscale quantum systems, and in par- 
ticular, to the study of work extraction from such finite 
systems [IlT - fl4j |. And finally, in incomplete quantum- 
state tomography, Gibbs models feature in the recon- 
struction schemes based on maximum entropy EMU, or 
in case there is an initial bias towards some non-uniform 
reference state, on the principle of minimum relative en- 
tropy (l9j . In all these examples, the measurement data, 
and hence any derived constraints, do not pertain to 
(quasi) infinite ensembles but to real samples which are 
small; and the systems are often too simple to exhibit 
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a clear hierarchy of time scales that would lead to an 
obvious choice for the relevant observables. 

When prior knowledge suggests that a small physical 
system is well described by a Gibbs state, yet the proper 
set of relevant observables is not evident a priori, the 
choice of the latter becomes a matter of statistical infer- 
ence. Competing theories might propose different sets of 
relevant observables; and the task is then to decide ratio- 
nally between them on the basis of rather sketchy data. 
Typically, this inference task involves a trade-off between 
goodness-of-fit on the one hand, favoring a large num- 
ber of relevant observables; and simplicity on the other 
( "Occam's razor" ) , favoring a number that is as small as 
possible. The appropriate framework for deciding such 
a trade-off is Bayesian model selection [20| . Adapting 
this framework to the task of finding the optimal set of 
relevant observables in a Gibbs model, be it classical or 
quantum, is one central objective of the present paper. 

Given the set of relevant observables, the next infer- 
ence task is the estimation of the associated Lagrange 
parameters. Whenever the sample size is small, the esti- 
mate must take into account not only experimental data 
but also prior expectations. The fact that the prior bias 
invariably exerts an influence on the parameter estimate, 
is readily seen in a trivial example: If five tosses of a coin 
yield "heads" five times, one is not yet ready to aban- 
don one's prior bias towards a more or less fair coin; only 
as evidence to the contrary accumulates, does this be- 
lief gradually erode. The challenge, then, is to find the 
relative weights to be attributed to prior bias and data. 
Again, the appropriate tools are furnished by Bayesian 
theory, namely Bayesian interpolation [21] in combina- 
tion with the evidence procedure [5] . Putting these tools 
to use for the estimation of Lagrange parameters in a 
Gibbs model, is the second main objective of the present 
paper. 

The paper is organised as follows. In Sec. [TT1 I will 
start with some preliminaries about the x 2 distribution, 
the entropy concentration theorem, and the concept of 
statistical significance, which will be needed in subse- 
quent arguments. Then I shall turn to the two inference 
problems outlined above, albeit in reverse order. In Sec. 
IIII1 1 will assume that the relevant observables of a Gibbs 
model are given, and show how its Lagrange parameters 
can be estimated in a way that accounts for both prior 
knowledge and measured data. The estimation proce- 
dure will yield not just the optimal values for the La- 
grange parameters, but also the associated error bars. In 
Sec. IIVI I shall consider the issue of the proper set of rele- 
vant observables, and show that Bayesian model selection 
provides a rational framework for choosing between rival 
proposals. I will illustrate this method with two exam- 
ples, the classical analysis of Wolf's die (Sec. [V} and 
the quantum problem of deciding between an Ising and 
a Heisenberg description of an assembly of qubits (Sec. 
ED). In Sec. EH1 I shall conclude with a brief summary. 

There are a number of appendices in which I collect 
technical definitions and results that might not be famil- 



iar to readers of this journal, yet whose inclusion in the 
main body of the text would render the flow of exposition 
unnecessarily cumbersome. Specifically, in Appendix [XJ 
I shall introduce the notion of a level of description; in 
Appendix [Bl the notions of coarse graining, relevant part 
of a state, and generalised Gibbs states; and in Appendix 
[Cl the definition and basic geometry of a Gibbs manifold, 
which includes as a special case (discussed in Appendix 
ID]) the geometry of the Bloch sphere. In Appendix [Ej 
I will introduce the concept of an entropic distribution 
on the Gibbs manifold; and in Appendix [FJ I will con- 
sider the meaning of the Gaussian approximation and 
of the thermodynamic limit. Finally, in Appendix [Gl I 
shall connect the general framework of Gibbs models to 
the familiar terminology and basic relations of thermo- 
dynamics. 

II. STATISTICAL SIGNIFICANCE 

In a generic experiment, some selected set of observ- 
ables, spanning the experimental level of description J-, 
is measured on N identically prepared copies of a phys- 
ical system, yielding sample means /. Given a reference 
state cr, these experimental data can be represented as a 
Gibbs model ju <E 7r£-(<S), with Lagrange parameters ad- 
justed such as to reproduce the observed sample means, 
fil 1 ) — f ■ The correspondence / ■<-> /x is one-to-one. 
On the same Gibbs manifold, let p denote a theoretical 
model yielding expectation values f{p)\ these generally 
differ from the observed sample means. As long as the 
difference is small, the relative entropy between data and 
theoretical model is approximately quadratic in the dif- 
ferentials 6f. According to Eq. (|C9I) . it is 

2NS(fA\ P ) « xVIIp), (1) 

with 

X 2 (p\\p):=NY,(C- 1 r b Sf a 5f b . (2) 

ab 

For definitions of further mathematical objects used here 
{F, 7r£(<S),C _1 ), see Appendices \K\ through [Cj 

Since the theoretical model may contain parameters 
that have been fitted to the data, the differentials Sf 
might not be all independent; the number k of indepen- 
dent differentials is generally smaller than the dimension 
of the Gibbs manifold. Given a (possibly fitted) theo- 
retical model, the likelihood that the k remaining inde- 
pendent degrees of freedom yield some % 2 in the interval 
[x, x + dx] is determined for large N by the probability 
density function 

pd%|fe) = 2- k ' 2 T{k/2y 1 x k / 2 - 1 exp(-a;/2), (3) 

known as the \ 2 distribution [13]. In this distribution, 
the exponential factor stems from the quantum Stein 
lemma (|B4[) ; the power factor from the fc-dimensional 
volume element; and the numerical factors ensure proper 
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normalisation. The x 2 distribution is peaked at Xmax = 
k— 2 (for k > 2), and has expectation value and variance 

(X 2 ) = fc , var( X 2 ) = 2k, (4) 

respectively. For large arguments (x 3> k In fc) it has an 
exponential tail independent of k, 



pdf(x|fc) ~ exp(— x/2) 



(5) 



The above distribution of x 2 , an d hence of relative 
entropy, implies the entropy concentration theorem [23| : 
As the sample size N increases, the relative entropy be- 
tween data and theoretical model is predicted to be- 
come more and more concentrated (with a width of or- 
der 1/iV) around a smaller and smaller expectation value 
(also of order 1/N). Relative entropy being approxi- 
mately quadratic in the coordinate differentials 5f, this 
implies as an immediate corollary that deviations be- 
tween measured sample means and theoretical expecta- 
tion values are expected to scale as 0(1/VN). The en- 
tropy concentration theorem can thus be employed to 
assess quickly the statistical significance of experimental 
deviations from theoretical predictions: As long as the 
relative entropy between data and theoretical model is 
of the order 1/N, deviations likely fall within the range 
of statistical fluctuations; yet as soon as their relative 
entropy exceeds this limit, deviations become significant 
and may indicate the need to revise the theoretical model. 
This simple entropy test is closely related to the x 2 test in 
conventional statistics. Theoretical models are typically 
rejected whenever at the observed x 2 the cumulative dis- 
tribution function exceeds some predefined bound, whose 
value in turn depends on the confidence level required. 
The x 2 test then points to the need to revise a given the- 
oretical model, and may trigger creative thinking about 
possible alternatives. 



III. ESTIMATING LAGRANGE PARAMETERS 

I consider the situation where it is assumed from the 
outset that a physical system is to be described by some 
Gibbs model ui € -Kg(S), with given reference state a 
and level of description Q, yet unknown parameter val- 
ues. The initial uncertainty about the parameter values is 
reflected in a prior probability distribution prob(w | a, Q) 
over the Gibbs manifold. Subsequently, on a sample of 
size N , one measures some set of sample means /. The 
associated experimental level of description J- may or 
may not coincide with the theoretical level of description 
Q. In the light of the observed sample means and the 
prior distribution over the Gibbs manifold, one wants to 
infer the most plausible estimate for uj. 

After collecting the experimental data, the probability 
distribution over the Gibbs manifold must be updated 
according to the Bayes rule [3] 



prob(w|/, N, T; a, Q) cx prob(/|iV, uj, J") prob(w|cr, Q). 



The first factor on the right hand side is the likelihood of 
observing the sample means /, given uj; it is 



prob(/|A, W ,^)cx / dppvoh(p\N,uj), (7) 
J s\ f 

with the integration ranging over the submanifold S\f 
of all states that satisfy the constraints f(p) — f, and 
normalised according to Eq. (|C10|1 . 



(8) 



f J[ df b Vdet C- 1 prob(/| N, uj, F) = 1. 



For large N, by virtue of the quantum Stein lemma (|B4|) 
and the law of Pythagoras (|F6|) . the likelihood can be 
written as 



proh(f\N,UJ, F) oc exp[-NS(fJ,\\oj)], 



(9) 



where p € ^(S) is the unique Gibbs model associated 
with the measured / and reference state uj. In other 
words, on the Gibbs manifold irjr(S), the Gibbs model p 
representing experimental data is distributed entropically 
around uj, p ~ Ent(7V, uj, J 7 ) (see Appendix IE]) . 

The second factor on the right hand side of the Bayes 
rule ([6]) is the prior. In principle, it can take any form; 
there is no constraint as to the prior knowledge that an 
agent may have. But there are good reasons to assume 
that it is entropic, too, u> ~ Ent(a,cr, Q), with its peak 
at some initial bias a, and the parameter a character- 
ising the agent's degree of confidence as to this initial 
bias. Conceptually, if the only prior knowledge avail- 
able is the initial bias a, then one would demand of a 
prior distribution on TTg(S) that it be peaked at and 
symmetric around this bias; that it be form-invariant un- 
der coarse graining; and that upon composition of sys- 
tems, it be non-committal as to any correlations between 
the systems. As I discuss in more detail in Appendix 
lEl these requirements are satisfied by entropic distribu- 
tions. In addition, an entropic prior is particularly con- 
venient because it is (approximately) conjugate to the 
likelihood (JSJ: Upon any measurement that is informa- 
tionally complete with respect to the unknown model pa- 
rameters, J- D G, Bayesian updating yields a posterior 
which (in the Gaussian approximation) is again entropic, 
and which differs from the prior only by a change of pa- 
rameters, (a, a) — > (a 1 , a'). 

Assuming an entropic prior and making the Gaussian 
approximation, the Bayes rule yields for T D Q the pos- 
terior (IF8|) . and hence 



probM/, N, T; a, Q) oc prob(w|a + N, p, Q); (10) 
whereas for T C Q, it yields the posterior (|F9I) . and hence 

prob(w|/, N, T;a,Q) oc prob(7r£(u;)|a + N, p, T) x 

prob« s if> r(u>)\a, p, ^ g , p T()ll) 

In both cases, the posterior is peaked at the model 

a , N 



(6) 



p oc exp 



a 



N 



hi 



a - 



o 



N 



In 7i"£ns 0) 



(12) 
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This model constitutes the most plausible posterior esti- 
mate for u>. 

The posterior estimate for u> interpolates between ini- 
tial bias and data, and depending on the relative sizes of 
a and N, may attribute more weight to one or the other; 
this is an example of Bayesian interpolation 21|. In the 
extreme case where the prior is sharply peaked while sam- 
ple sizes are small, N <^ a, parameter estimation will be 
dominated by the prior, and one is therefore advised to 
stick to the initial bias, p ~ a; while in the opposite case 
where the prior is broad and sample sizes are big, N 3> a, 
parameter estimation will be dominated by the likelihood 
function, and the best estimate for the model is close to 
the maximum likelihood estimate p « TrJ-ng 0") ■ Attached 
to the estimate are error bars of the order 0(l/\/a + N) 
as to those model parameters that have been measured, 
and in case T C Q, 0(1/ \fa) as to those that have not. 

The above estimation procedure preserves the Gibbs 
form, in the following sense. Whenever the prior bias a is 
a generalised Gibbs state, with some level of description 
that encloses TnG, the posterior estimate will retain this 
form, 

d€4(5) , TiDJfl^ peirMS), (13) 

for arbitrary values of a and N. In particular, it is p € 
TTjr n g(S). If on the Gibbs manifold 7r^(6>) the prior bias 
has Lagrange parameters A(cr), and the maximum like- 
lihood estimate has Lagrange parameters A(7r£ n g(/Lt)), 
then the Lagrange parameters of the posterior estimate 
(I12p are given by linear interpolation, 



occupation probabilities {pi}, i = . . . 24, of the en- 
ergy levels follow a canonical distribution. There is un- 
certainty about the temperature, but according to ini- 
tial estimates, it is expected to be around 100K. As re- 
gards the system's state, therefore, the initial bias is a 
canonical state a cx exp(— f3H), with level of descrip- 
tion H = span{l,ff}, uniform reference state, and La- 
grange parameter /3(cr) w 1/100K. Then one performs 
N = 12,000 runs of the experiment, and in each run, 
measures the actual occupation of the energy levels. One 
finds that the measured distribution of relative frequen- 
cies {fi} differs from the expected {pi}- The observed 
mean energy, J^i/i^i; corresponds to a temperature 
110K rather than 100K; and in addition, the shape of 
the observed distribution may or may not deviate from 
the canonical form. Altogether, one finds that the data 
differ from prior expectation by a distance, say, 



24 



X 2 (7r£(M)IH « 2N^fMfi/Pi) ~ 96 

i=0 



(17) 



This deviation is significant enough, and the number of 
independent sample means (dim7r£-(<S) = 24) is suffi- 
ciently large, to satisfy both conditions in Eq. (|T5|) . So it 
is justified to apply the evidence procedure, yielding the 
interpolation parameter a /(a + N) 1/4. If one insists 
that the system be modelled by a canonical distribution, 
Q = %, then the posterior estimate for its inverse tem- 
perature is neither the initial 1/100K nor the observed 
1/1 10K, but the interpolation (fl"4|). which in this example 
yields (3{p) w 1/107.3K. 



Hp) 



a + N 



X(a) 



N 



a + N 



(14) 



The posterior estimate depends critically on the pa- 
rameter a; this parameter has so far been left unspeci- 
fied. Provided the experiment reveals a significant devi- 
ation from the initial bias (only then does the need arise 
to update this bias) and is sufficiently detailed, 



(15) 



the optimal value for a can be estimated a posteriori with 
the help of the evidence procedure This procedure 
yields an interpolation parameter 

a/(a + JV) « dim^(S)/ x 2 (7r^)||a). (16) 

The estimate depends only on the experimental level of 
description J 7 , but not on the theoretical level Q employed 
for the Gibbs model w. 

To illustrate the above framework, I consider the fol- 
lowing simple example. A source emits a physical sys- 
tem (say, a molecule) which can be in its ground state 
or in one of 24 excited states; this spectrum may or 
may not be degenerate. Prior theoretical considerations 
suggest that the source is thermal, and hence, that the 



IV. COMPARING LEVELS OF DESCRIPTION 

Up to this point the level of description of the theo- 
retical model, and hence the Gibbs manifold 7rjJ (S) from 
which a model was to be selected, have been assumed 
to be given a priori. Now they will become themselves 
subject to statistical inference. 

If a model is to have any explanatory value, its number 
of parameters must be strictly smaller than the number 
of data points; and so its level of description must be a 
proper subspace of the space spanned by the measured 
observables, Q C T . In fact, in the spirit of Occam's razor 
one would always prefer simpler models over more com- 
plicated ones; yet when this is taken too far, the fit with 
the data might deteriorate. Striking the right balance 
between simplicity and goodness-of-fit, and determining 
thus the optimal level of description, constitutes a non- 
trivial inference task. In this Section, I shall discuss how 
the Bayesian framework for model selection can guide 
the proper choice of the level of description. If presented 
with two rival proposals for the level of description, this 
framework allows one to evaluate their relative degree of 
plausibility in the light of experimental data and prior 
expectations. 

If a x 2 analysis has revealed that observed devia- 
tions from model predictions are statistically significant, 
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one might consider moving to a more accurate model 
by expanding the level of description, Q — > H, with 
G C % C T . Provided the priors on the respective Gibbs 
manifolds Ttg (<S) and 7r^ (S) are both entropic around the 
same initial bias tr, the relative plausibility of the two lev- 
els of description is given by the Bayes rule, 

prob(C/|^, N, J"; a, a) _ prob(<?) prob(/x| N, F; a, er, Q) 
prob("H|^t, N, J 7 ; a, a) prob("H) prob(/i| N, J 7 ; a, a, H) 

(18) 

Here T denotes the experimental level of description, N 
the sample size, and /i € 7ijr(S) the Gibbs model associ- 
ated with the measured data. The parameter a, which 
characterises the degree of confidence as to the initial 
bias, is assumed to be identical for both entropic priors; 
this assumption is corroborated by the estimate (1161) . 
which does not depend on the level of description em- 
ployed for a theoretical model. 

The first factor on the right hand side is the ratio 
of prior preferences which, to be fair, is often taken to 
be of order 1. The second factor can be calculated via 
marginalisation, 

prob(/x|7V, F; a, a, Q) = 

I duj prob(^i|7V, w, T) prob(w|a, a, Q), (19) 

and likewise for T-L. In the Gaussian approximation, the 
integrand is given by Eq. (|F8|) ; which in the regime N ^> 
a, with p 7rg(/it), yields 

prob(/x|iV, F; a, a, Q) « 

probity TrgOu), ^,**mQ) prob(7rg(^)|a, a, £).(20) 
The ratio is then 

piob(ji\N,F;a,iT,g) _ N s ' 2 exp[-NS(7T^)\\^(fi))] 
prob(p\N,T]a,a,n) ~ cW 2 exp[-a5(7r^(/x)||7rg(/x))] ' 

( 21 ) 

where s := (dim'H — dimC?) denotes the number of ad- 
ditional model parameters introduced in the expansion 
Q — > %. The power factors N s / 2 and a s / 2 stem from 
the normalisation factors of likelihood and prior, respec- 
tively, which depend on the dimension of the theoretical 
level of description. 

Bayesian model selection is thus driven by two main 
factors [13, HH : (i) a ratio of exponentials (of which, in 
the regime N 3> a, the denominator can often be ap- 
proximated by 1) favoring the finer- grained model with 
better fit; and (ii) the "Occam factor" (N/a) s ^ 2 , which 
favors the simpler model. It is the trade-off between the 
exponentials on the one hand, and the Occam factor on 
the other, which typically determines whether or not the 
level of description should be expanded. If their product 
is much larger than 1, one better stays with the original, 
coarser-grained description. In contrast, if it is much less 
than 1, one is advised to switch to the finer- grained de- 
scription. And if it is of the order 1, the analysis remains 
inconclusive, and more data must be collected. 



It is important to note that the trade-off decision is 
not based on experimental data alone. Rather, it de- 
pends also on the initial bias a and on the parameter a. 
The initial bias constitutes one's starting hypothesis for 
the model, prior to performing any measurements, and 
is usually based entirely on symmetry and other theoret- 
ical considerations; whereas a quantifies the associated 
degree of confidence. Both a and a reflect prior expecta- 
tions of the agent who conducts the experiment, and so 
in principle, carry aspects which remain irreducibly sub- 
jective. In practice, however, rational agents typically 
agree on the symmetries of the system under study, and 
hence on a unique initial bias to mirror these symmetries. 
In fact, in many cases the initial bias is just equidistri- 
bution, a = \/d, being maximally non-committal in the 
absence of any empirical data. The parameter a, on the 
other hand, can often be estimated a posteriori with the 
help of the evidence procedure. 

For large N, the estimate for a becomes independent 
of N, and hence the asymptotic behavior of the ratio (j!?Tj) 
is governed entirely by its numerator. Models can then 
be selected according to the simple rule of thumb 

!< In N : keep Q 
~ IniV : inconclusive . (22) 
> IniV : expand Q -> U 

Loosely speaking, whenever the gain in accuracy per ad- 
ditional parameter stays below the threshold IniV, one 
better sticks to the simpler model. Only when this 
threshold is exceeded, is one advised to move to the finer- 
grained model with better fit. The threshold is higher 
than the threshold for mere statistical significance; if 
1 < x 2 / s < IniV then the potential accuracy gain is 
significant, yet a refinement of the model is still not rec- 
ommended. 

While Bayesian model selection is a useful quantitative 
tool to guide the search for the proper level of descrip- 
tion, it does not amount to an algorithm leading uniquely 
to "the" ideal level of description. The number of pos- 
sible levels of description is infinite, and while the above 
framework may help choose between any two of them, it 
cannot replace the creative act of coming up with suit- 
able candidates [2l|. This creative part is beyond the 
realm of pure probability, and must involve additional 
physical considerations such as the study of symmetries, 
conservation laws, and time scales. 



V. WOLF'S DIE 

To warm up for the interesting quantum case, I shall 
illustrate the use of the above mathematical tools in 
a famous classical example, Jaynes' analysis of Wolf's 
die data Rudolph Wolf (1816-1893), a Swiss as- 

tronomer, had performed a number of random exper- 
iments, presumably to check the validity of statistical 
theory. In one of these experiments a die was tossed 
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i 


fi 


Aj 


1 


0.16230 


-0.00437 


2 


0.17245 


+0.00578 


3 


0.14485 


-0.02182 


4 


0.14205 


-0.02462 


5 


0.18175 


+0.01508 


6 


0.19660 


+0.02993 



TABLE I. Wolf's die data: frequency distribution / and its 
deviation A from the uniform distribution. 



TV = 20, 000 times in a way that precluded any system- 
atic favoring of any face over any other. The prior expec- 
tation was a perfect die, o = 1/6. However, the observed 
relative frequencies {fi} deviated from this expectation; 
their measured values are shown in Table HI A quick 
analysis reveals that 1/y/N ~ 0.007, so several devia- 
tions Aj are outside the typical range. More precisely, 
the observed x 2 (A i l|o') ~ 271 lies in the exponential tail 
far beyond its expected value. The probability density for 
such a large x 2 is extremely small, pdf(271|5) ~ 10 -56 , 
pointing to the presence of systematic defects of the die. 

To reflect the presumed nature of the die's imperfec- 
tions, one may consider a multitude of different levels of 
description. Three specific examples are (i) the simplest 
level of description, O = spanjl}, corresponding to a 
Gibbs manifold ttq(S) that consists of the single state a 
only, where one stubbornly sticks to the initial bias; (ii) 
at the opposite extreme, the most accurate level of de- 
scription J 7 , where one denies the existence of any simple 
explanation for the observed deviations and just intro- 
duces as many model parameters as data points; and (iii) 
an intermediate level of description Q, with two observ- 
ables characterising the two most likely imperfections. 
These are, according to Jaynes: 

• a shift of the center of gravity due to the mass of 
ivory excavated from the spots, which being pro- 
portional to the number of spots on any side, should 
make the "observable" 



Gj := i - 3.5 



(23) 



have a nonzero average. Indeed, the measured sam- 
ple mean is g\{^) = 0.0983 ^ 0; and 

errors in trying to machine a perfect cube, which 
will tend to make one dimension (the last side cut) 
slightly different from the other two. It is clear from 
the data that Wolf's die gave a lower frequency for 
the faces (3,4); and therefore that the (3-4) dimen- 
sion was greater than the (1-6) or (2-5) ones. The 
effect of this is that the "observable" 



G'n 



1 : i 
-2 : i 



1,2,5,6 
3,4 



(24) 



refinement 


s 


x 2 


X 2 /s 




2 


262 


131 




3 


9 


3 




5 


271 


54 



has a nonzero average. Indeed, 32 (m) = 0.1393 7^ 0. 



TABLE II. Wolf's die data: number of additional model pa- 
rameters and accuracy gain associated with expansions of the 
level of description. 



If this intermediate level of description turned out to be 
the most plausible, it would provide a genuine explana- 
tion, rather than merely a description, of the observed 
data. 

The sample size is large enough to warrant the use 
of the rule of thumb (|2"2"|) . Successive refinements O —> 
Q — > T of the level of description entail additional model 
parameters and accuracy gains as summarised in Table 
HIl Only the first refinement, O — > Q, delivers an accu- 
racy gain per additional model parameter that exceeds 
the threshold In AT w 10. In contrast, the second refine- 
ment Q — > J 7 , albeit delivering a further accuracy gain 
that is statistically significant, does not pass this thresh- 
old. In case the intermediate level of description Q was 
not available, and hence there was a choice only between 
the "trivial" level of description O and the "perfect fit" 
level of description J 7 , the latter would be more plausi- 
ble. Sticking stubbornly to the initial bias is the least 
plausible of the three options. 

If presented with the choice between the three levels 
of description outlined above, therefore, statistical anal- 
ysis reveals the intermediate, "explanatory" level of de- 
scription to be the most plausible. This is not to say, 
however, that this is indeed the best level of description: 
One might come up with many more alternative propos- 
als, which would all have to be compared with each other. 
Moreover, even if the above intermediate level of descrip- 
tion were confirmed as the winner, statistical analysis 
would only yield its relative degree of plausibility, and 
would never provide certainty about its being the "true" 
level of description. Statistical analysis cannot replace 
the creative act of designing levels of description which, 
as in the example above, are not only supported by the 
data but also well motivated physically. 



VI. ISING VS. HEISENBERG 

Conceptually, Bayesian model selection for quantum 
systems proceeds in the same way as in the classical 
case. The quantumness of the problem enters through 
the different geometry of the Gibbs manifold. As the 
simplest example, I shall study an exchangeable assem- 
bly of qubits; there, the geometry to consider is that of 
the Bloch sphere. 

Initially, nothing is known about the qubits, so the 
prior bias a is uniform. Then measurements on a sam- 
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pie of N qubits reveal an average Bloch vector of length 
r, with an orientation n that is tilted by a small an- 
gle 59 against the z axis. The Bloch vector length 
r is considerably larger than zero, so a new model is 
called for which is different from the uniform initial bias. 
There might be good physical reasons to expect that 
the system under consideration is strongly anisotropic 
in the z direction, suggesting a level of description I (for 
"Ising" ) comprising the z component of Pauli spin only, 
X = span{l,cr z }. In view of the observed tilting angle, 
however, there might be controversy about this, and a 
rival proposal ( "Heisenberg" ) might claim that the level 
of description should rather include the full Pauli vector, 
H = span{l, a x , a y , a z }. 

To weigh these alternatives in the light of the data, one 
must evaluate 

xVhUOIK (/*))/« = NC^86 2 /2, (25) 

where Cgg denotes the polar component of the entropy- 
induced metric tensor (|D9|) on the Bloch sphere. For 
instance, for N = 20, 000, r = 0.73 and a tilting angle of 
1 degree, 59 = 2vr/360, it is C e g « 0.678 and x 2 /s « 2.1. 
Despite a gain in accuracy which is significant, this does 
not exceed the threshold IniV w 10, and hence Bayesian 
model selection favors the simpler anisotropic model. At 
an angle of 2 degrees, x 2 / s grows to approximately 8.3; 
and this being close to the threshold, the analysis remains 
largely inconclusive. Finally, for a tilting angle of 3 de- 
grees, the accuracy gain per additional model parameter 
attains a value well beyond the threshold, x 2 j s 18.6, 
tipping the balance in favor of the more detailed level of 
description. 

Had one measured a Bloch vector length r — 0.995 
instead of 0.73, the balance would have tipped in favor 
of the expanded level of description already at a critical 
angle of 1 degree, rather than 2 degrees. In general, the 
more the measured state approaches purity, the more sen- 
sitive the choice of level of description becomes to minor 
directional aberrations from the preferred axis. 

VII. CONCLUSIONS 

Outside the macroscopic domain, estimating a Gibbs 
state is a nontrivial inference task, due to two compli- 
cating factors. First, for lack of a clear hierarchy of time 
scales, the proper set of relevant observables might not 
be evident a priori but subject to statistical inference. 
Second, whenever experimental data are gathered from 
a small sample only, the best estimate for the Lagrange 
parameters is invariably affected by the experimenter's 
prior bias. Both issues can be tackled with the help of 
Bayesian techniques, suitably adapted to the problem at 
hand: Bayesian model selection, Bayesian interpolation, 
and the evidence procedure. 

The results presented in this paper may have ramifi- 
cations in a variety of areas. For the study of thermal 
properties of a microscopic system (e.g., a tiny probe 



taken from a larger system that is presumed to be ther- 
mal, or the debris from a single collision experiment) 
the framework presented here allows one to decide ra- 
tionally between rival theories about the proper set of 
relevant observables, and subsequently, to find the best 
estimate for the associated Lagrange parameters. For in- 
complete quantum-state tomography, the results imply 
Bayesian corrections to the conventional maximum en- 
tropy scheme; these corrections become important when- 
ever sample sizes are small. Moreover, the approach pre- 
sented here yields not just estimates for the Lagrange 
parameters, but also the attached error bars. Finally, on 
a conceptual level, the framework allows for a careful con- 
sideration of the thermodynamic limit, and so may shed 
new light on the long-standing debate about the gener- 
ality of, or possible limitations of, the maximum entropy 
paradigm in statistical mechanics [25T - |29j | . 

I see three avenues for further research. First, it will be 
interesting to see how the Bayesian corrections to conven- 
tional state reconstruction schemes play out in practice. 
A simple example has been discussed (in the context of 
the evidence procedure) in Ref. [f|; more examples and 
application to real-world experimental data will be the 
subject of further work. Second, while the model se- 
lection framework used here allows one to assess differ- 
ent proposals for the set of relevant observables, it does 
not provide a direct route to the optimal such set. Do- 
ing so requires an extension of Bayesian reasoning from 
the space of states to the space of levels of description, 
which will be tackled in future work. Finally, I consider 
it worthwhile to study in more detail the asymptotic be- 
havior of the schemes presented here, in an effort to un- 
derstand better the emergence of orthodox theory in the 
macroscopic limit. 

Appendix A: Level of description 

Any real linear combination of observables is again an 
observable. The observables of a physical system thus 
constitute a real vector space. This vector space can 
be endowed with a positive definite scalar product, the 
canonical correlation function with respect to some ref- 
erence state a, 

(X;Y) a := [ dvtx{a v Xa 1 - v Y); (Al) 
J o 

so it is in fact a Hilbert space. Within this real Hilbert 
space of observables, the (typically small) set of observ- 
ables {G a } which are deemed relevant for the problem 
at hand, together with the unit operator, span a proper 
subspace 

g:=span{l,G a }. (A2) 

This subspace is termed the level of description Q . 

Levels of description might be related by coarse grain- 
ing or complementation. A level of description Q is 
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"coarser" than another level of description J 7 , Q C J 7 , if 
the former is a subspace of the latter. The coarse grain- 
ing relation C induces a partial ordering of the levels of 
description, with unique minimal element O := span{l} 
and maximal element A, the total Hilbert space of ob- 
servables. The level of description Q is "complementary" 
to J 7 , Q = -^a,o-J~, if observables from both levels of de- 
scription together span the entire space of observables, 
and if in the reference state a the two levels are uncorre- 
lated, 

G = ^A.a? span{l, G a , F b } = A , 

(SX; 8Y) a = VX e Q, Y € T, (A3) 

with SX := X — (X) a . Complementation reverses the 
direction of coarse graining, 



The relevant part is thus determined by the minimization 

S(rf(j>)\\<r) = min S(p'\\a), (B2) 

g{p')=g(p) 

where I employed g(p') as a shorthand notation for the 
set{(G a ) p ,}. " 

That the relative entropy is the appropriate measure 
for the distance between two states, follows from the 
quantum Stein lemma [33 - l36l | . According to this lemma, 
given a finite sample of size N taken from an i.i.d. source 
of states cr, the probability that tomography on this sam- 
ple will erroneously reveal some different state p, 



prob x _ e (p|iV,cr) := 

^ Ar )|prob(r|^ A ') 



inf {prob(i> 5 



> 1 - e} , (B3) 



(A4) decreases asymptotically as 



and when applied twice, it returns the original level of 
description, 



n A,a^A.aG = G- 



(A5) 



The properties of coarse graining and complementation 
are reminiscient of those of logical implication and nega- 
tion. In this sense, one may say that the space of observ- 
ables gives rise to a minimal logical structure. 

The intersection and closed hull of two levels of descrip- 
tion are denoted by G H T and G U J 7 , respectively. In 
line with the logical structure mentioned above, the oper- 
ations n, U share some properties with the Boolean "and" 
and "or" operations such as commutativity, associativity 
and reversal under complementation; yet the analogy is 
not perfect since in contrast to classical Boolean logic, 
they violate distributivity. If the levels of description 
pertain to two different physical systems A and B then 
it is C? A nJ B = O ab , and 



Q A \JT B =span{l A ® 1 S ,G^ 



I s , \ A 



prob 1 _ e O»|JV,tr)~exp[-JVS(p||(r) 



(B4) 



regardless of the specific value of the error parameter e 
(0 < e < 1). The r featuring in the above definition 
are propositions (projection operators) about the sample 
which asymptotically, i.e., to within an error probability e 
that does not depend on sample size, are compatible with 
the sample being in the state p® N . Taking the infimum 
over r picks that proposition which is most confined, and 
hence discriminates best between a and p. The coefficient 
in the exponent is the relative entropy between the two 
states, which is thus recognised as the proper measure of 
their distinguishability |33| . 

The relevant part of a state has the generalised Gibbs 
form [33 



7r§(p) = Z(A)- 1 exp 



(ln ( x-(ln ( 7) ff )-^A a G a 



a=l 



.Fff }. (A6) with the partition function 



(B5) 



A further way to concatenate the two constituent levels 
of description is by means of the tensor product 

G A ®F B :=spim{l A ®l B ,G A ®l B A A ®F b B 7 G A ®F b B }. 

(A7) 



Appendix B: Relevant part of a state 

For an arbitrary state p, its relevant part with respect 
to a level of description Q and reference state a is the 
unique state 7rg(p) which for all observables in the level 
of description yields the same expectation values as p, 
yet within this constraint, is as close as possible to the 
reference state. The distance to the reference state is 
measured in terms of the relative entropy [30l-l33| 



S(p\\a) :— tr(plnp — plna) 



(Bl) 



Z{\) := tr < exp 



a=l 



(B6) 



ensuring state normalisation, and the Lagrange parame- 
ters {A a } adjusted such that g{^g{p)) — g{p)- Amongst 
all states of the above generalised Gibbs form, the rele- 
vant part of p is that which comes closest to p in terms 
of relative entropy, 



S(p|K(p)) = minS(p|K(p')). 



(B7) 



The reference state is often, but not always, the uniform 
distribution; if so, the generalised Gibbs state acquires 
the more familiar form 



ng(p) = Z(X) 1 exp 



(B8) 
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(with superscript a omitted) which maximizes the von 
Neumann entropy S[p] := — tr(plnp) under the given 
constraints. 

Since the relevant part of a state retains complete in- 
formation solely about selected degrees of freedom (the 
observables contained in the level of description), while 
discarding information about the rest, the map itg : p — > 
TTg(p) may be regarded as a coarse graining operation. 
Indeed, this operation bears some resemblance to a pro- 
jection operator: it is idempotent, 

7T T g O TTg = TTg (B9) 

(even for r ^ a) ; successive coarse grainings with smaller 
and smaller levels of description are equivalent to a one- 
step coarse graining with the smallest level of description, 



G C T ^> Tig on 



T 



(BIO) 



and it is covariant under unitary transformations, 
n^l(UpU^ = U^(p)U^ 



(bii; 



In contrast to a true projection operator, however, the 
coarse graining map is in general not linear. In case of 
a uniform reference state, the coarse graining map is the 
(possibly nonlinear) dual of the Kawasaki-Gunton pro- 
jector, a projection superoperator acting on the space of 
observables 



Appendix C: Gibbs manifold 

Let S denote the set of all normalised mixed states 
of a given physical system. This set constitutes a dif- 
ferentiable manifold of dimension (d 2 — 1), where d is 
the Hilbert space dimension. In this manifold, states of 
the generalised Gibbs form (|F35|) constitute a submanifold 
7Tg(5); I call it the Gibbs manifold associated with level 
of description Q and reference state a. A point on this 
Gibbs manifold, and hence a specific state of generalised 
Gibbs form, is a Gibbs model. The Gibbs manifold has 
dimension 



dim 71-5(5) = dimC/ — 1, 



(CI) 



which equals the number of relevant observables {G a } 
as long as these are linearly independent. Coordinates 
on the manifold may be the Lagrange parameters {A a } 
or the expectation values {g a }, or any set of (dimt/ — 
1) independent functions thereof. Lagrange parameter 
coordinates are related to expectation value coordinates 
via 



g a = -d(lnZ)/dX a . 



(C2) 



Upon infinitesimal variation of the Lagrange parame- 
ters, the expectation value of an arbitrary observable A 
changes by 



d(A) = -Y,(SG a ;A)d\ a , 



(C3) 



with SG a :— G a — g a , and the expectation values and 
the canonical correlation function evaluated in the model 
with coordinates {A a }. A special case is the variation of 
relevant expecation values, 



dg b = -^2d\ a C ab , 



where the coefficients 



C a b ■= (SG a :5G b ) = 



d 2 



(C4) 



(C5) 



dX a dX b 

form the r x r correlation matrix. As the canonical cor- 
relation function has all properties of a positive definite 
scalar product in the space of observables, the correlation 
matrix is symmetric and positive. 

The Gibbs manifold is endowed with a natural Rieman- 
nian metric and volume element, induced by the relative 
entropy [3^, H(| • As one would expect from a proper dis- 
tance measure, the relative entropy between two states 
is always positive, 



s(p\\p') > 0, 



(C6) 



with equality if and only if p = p'; and even though 
it is in general not symmetric, S(p\\p') 7^ S(p'\\p), it is 
approximately so for nearby states: 



S(p\\p + Sp)~0((6 P ) 2 ). 



(C7) 



The relative entropy between two points u>,lj on the 
same Gibbs manifold is 

5(w||w') = J2( X ' a ~ Xa ^ a + ( lnZ ' - lnZ ) ; ( C8 ) 



which for nearby states is approximately quadratic in the 
coordinate differentials, 



SVIk + <M ~ {l/2)}_^C ab 5X a 5X b 



ab 



(1/2) ^(C-'r>5g a 5g b . 

ab 



(C9) 



The correlation matrix C or its inverse C _1 , respectively, 
may thus be regarded as a metric tensor on the Gibbs 
manifold. Associated with this metric is the volume ele- 
ment 

/ du= [ TTdAVdetC = / TT efc„Vdet C" 1 . 

J *iW J a J a 

(CIO) 

Given some coarser level of description H, % C G, the 
Gibbs manifold ^g(S) can be viewed as a fiber bundle, 
with the reduced Gibbs manifold 7r^(<S) as its base, and 
the coarse graining map 7r^ as the bundle projection, 



tt£ : n^S) 9 u -► C G 7r£(5). 



(Cll) 



The fiber over £ is the submanifold of Gibbs models sat- 
isfying the constraint h(u>) = h(C), 



^o(7r-)- 1 (C)=7rg( l S)|, l(c) . 



(C12) 
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It is then possible to factorize volume elements of the 
original Gibbs manifold into those of its fiber and base, 



dQ I duj. 



(C13) 



Appendix D: Geometry of the Bloch sphere 

Any normalised mixed state of a single qubit can be 
written as 



p=(l/2)(l + (a) p -a), 



(Dl) 



with a defined as the vector of Pauli matrices, u := 
(<t x , <J V , <J Z )- The expectation value of the latter is the 
Bloch vector; it has the spatial direction n and length r, 



er „ = m. 



(D2) 



The Pauli matrices being informationally complete, the 
above state can always be brought into the Gibbs form 



p = Z(A) _1 exp(-A- a), 
with Lagrange parameters 

A = — (tanh^ 1 r)n 
and partition function 



Z{\) = 2cosh|A| = 2/ VI -r 2 . 



(D3) 



(D4) 



(D5) 



The relative entropy between two arbitrary qubit 
states is 

S(pWp') — rtanh -1 r — rtanh -1 r'(h ■ n') + 

(l/2)ln((l-r 2 )/(l-r' 2 )); (D6) 

which for nearby states becomes approximately 

S(p\\p') « (l/2)[C- 1 fr 2 + CrfSP + C^Stf]. (D7) 

Here (r, 9, 4>) are the spherical coordinates of the Bloch 
vector as defined by 

(o~ x ) = rsmOcoscf) , (a y ) = r sin 9 sin ()> , (a z )—r cos 9, 

(D8) 

with r e [0,1], 9 e [0,tt] and <j> € [0,2tt); and C" 1 de- 
notes the entropy-induced metric tensor (inverse of the 
correlation matrix) on the Bloch sphere. In spherical co- 
ordinates this metric tensor is diagonal, 

C- 1 = diag(l/(l - r 2 ),rtanh _1 r,rtanh _1 rsin 2 9), 

(D9) 

but differs from the ordinary metric diag(l, r 2 , r 2 sin 9). 
Consequently, the associated volume element 



too, differs from its ordinary counterpart, especially near 
the surface of the Bloch sphere: 



Vdct C~ 



tanh 



1 



r < 1 
r — > 1 



(Dll) 



Distinguishable quantum states are thus not spread uni- 
formly throughout the Bloch sphere as one might expect 
classically, but are concentrated on or near its surface. 



Appendix E: Entropic distribution 

The coordinates of a Gibbs model u> € 7r§(<S), and 
hence its location on the Gibbs manifold, might not be 
precisely known but have some probability distribution. 
Such a distribution over the Gibbs manifold is entropic, 
lj ~ Ent(a, a, if it has the form 



prob(w|a, cr, Q) tx 



exp[— aS , (w||(r)] : u> G ^g{S) 







: else 



V det C _1 = r tanh 1 rsin^/Vl — r 2 , 



(D10) 



(El) 

with a > and a factor of proportionality that does 
not depend on w. For large a this is approximately a 
Gaussian on -Kg (S) of width 1 / yja around the reference 
state a. 

The entropic distribution has a number of important 
properties, (i) If ui is entropically distributed then so is 
UujU^ for any unitary U, with co-transformed reference 
state and level of description, 

W oh{UujU^\a, UaU\ UgU f ) = prob(w|a, cr, g). (E2) 

(ii) Coarse graining Q — > % C Q leaves relative probabil- 
ities invariant, 

prob(7r^(w)|a;, cr, T~C) cx prob(7r^(a;)|Q:, a, Q), (E3) 

with a factor of proportionality which is independent of 
lj. (hi) If the reference state is uncorrelated then the 
entropic distribution does not introduce any bias towards 
spurious correlations, 

prob{uj AB \a, a A (g> o B ,Q A ® T B ) < 
W oh{< J j A ®uj B \a 1 a A ®a B i g A ®F B )i (E4) 

where oj a ,uj b are the respective reductions of oj ab . And 
(iv) for uncorrelated states, the probability factorises, 

prob(ui A ®u) B \a,o- A ®a B ,G A ® F B ) cx 
prob(w A |a, a A ,g A ) prob(w B |a, a B ,F B ). (E5) 

If the reference state a is uniform then the entropic dis- 
tribution is in fact the only probability distribution with 
the above four properties, both in the classical [4l[ and 
in the quantum case [42j]. In contrast, for arbitrary a the 
uniqueness of the entropic distribution has been shown 
in the classical case only [43|; but I conjecture that this 
result, too, should carry over to the quantum case. 
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Finally, the product of two entropic distributions is 
again entropic, 

prob(w|iV", p, Q) prob(w|o!, a, Q) oc 

pTob(ui\a + N,p(fj,,a;t),g), (E6) 

provided they are defined over the same Gibbs manifold, 
n g(^) = 7r e( 1 ^)- Here p(p,a;t) £ ^g{S) denotes the 
interpolated reference state 

p{p, a; t) oc exp[(l — t) In p + 1 In a] (E7) 

with t := a/{a + N). 



Appendix F: Gaussian approximation and 
thermodynamic limit 

In many practical applications the states under con- 
sideration are all concentrated inside some small region 
of state space. It is then often justified to make the 
Gaussian approximation, in which relative entropies are 
quadratic in the coordinate differentials. In this approx- 
imation the relative entropy is symmetric, 

S(p||w) « S(u\\p). (Fl) 

Furthermore, the normalisation factor of the entropic dis- 
tribution becomes 

/ duj exp[-aS(u\\a)] « (27r/a) dim7r 5 (s)/2 , (F2) 

and thus independent not only of uj but also of a. As a 
consequence, the entropic distribution becomes invariant 
under exchange of uj and a, 

prob(w|a!, a, Q) « prob(cr|a, u, Q). (F3) 

When the Gibbs manifold TTg (S) is considered as a fiber 
bundle (see Appendix [C]) , with some reduced manifold 
7r^(<S), % C Q, as its base, then in the Gaussian approx- 
imation the fiber over £ £ tt^(S) is given by 

^oW-'CCW^wP). (F4) 
Moreover, for any oj £ ^g(S) the four states 

7T^(w) " UJ 

I I (F5) 

form a rectangle as shown, with opposite sides having 
approximately equal length as measured by the relative 
entropy. Together with the (exact) law of Pythagoras [UJ 
for the relative entropy, 

sMk) = sMkwM) + s(7r£HH, (fg) 



these properties imply that the entropic distribution fac- 
torises into separate distributions over fiber and base, 

prob(w|a, a, Q) « prob(7r^ g ^ n (uj)\a, a, ^q,</K) X 

prob(7r£( W )|a,a,H). (F7) 

If a model /i is entropically distributed, /i ~ 
Ent(A r , uj, F), around a reference state w which is itself 
entropically distributed, uj ~ Ent(a, cr, 5), then in the 
Gaussian approximation and for T D Q, the product of 
these entropic distributions is 

prob(p| N, uj, F) prob(w|a, a, Q) oc prob(/x|iV, p, F) x 
probHa + N, p, G) prob(p|a, a, T n Q), (F8) 

with a factor of proportionality that is independent of 
both p and uj, and with p £ TTjr n g(S) short for the inter- 
polated state p(TTjr n g(p),o-;t) as defined in Eq. (|E7j) . In 
the case T £\Q , the product is approximately 

pvob(p\N, uj, J 7 ) prob(w|a, a, Q) oc 
prob(7r^(^)| N, p, F) prob(7r^(w)|a + N, p, F) x 
prob« c ^{uj)\a, p, -TQ tP F) prob(p|a, a, T H 0).(F9) 

In the thermodynamic limit the parameter TV (but not 
the parameter a) approaches infinity, N — > oc. The in- 
terpolation parameter t then approaches zero, £ — > 0, and 
as a consequence, p — ^ ^jr n g{p)- In this limit the above 
product approaches asymptotically, for T D Q, 

prob(p|A, uj, J 7 ) prob(w|a, ct, Q) ~ 5-,^ a g(p — p) x 
cTg(w-p)prob(p|a,cr,.Fn£), (F10) 

where 5-,^ and 5g are multi-dimensional delta func- 
tions on the Gibbs manifolds 7r^ g(5) and ^(5), re- 
spectively; and for T C £ , 

prob(/x|iV, w, J 7 ) prob(w| a, cr, £) ~ (5jr(7rJ(a;) — p) x 
prob(7r^ e ^ jr (w)|a, / 9, ^ g , p J r )prob(p|a, a, J"n ^). (Fll) 

Appendix G: Link to thermodynamics 

Much of conventional thermodynamics amounts to ex- 
ploring the differential geometry of the Gibbs manifold, 
and in particular, transforming its coordinates to that 
set of variables which is best suited for the problem at 
hand. Contained in this set is usually the thermodynamic 
entropy, which for a Gibbs model uj £ ^g{S) is defined 
as 

S := -S{uj\\<j)- (\n<j) a . (Gl) 

(I employ natural units with ks = L) If the reference 
state is uniform, this reduces to the more familiar ex- 
pression S = — {]nui) u . The thermodynamic entropy is 
related to the Lagrange parameters and expectation val- 
ues via 

S = \nZ + Y / * a 9a, (G2) 

a 
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with differential 

dS = ^2\ a dg a . (G3) 

a 

In addition to the Lagrange parameters {A a }, the par- 
tition function Z might depend on further parameters 
these might be, say, parameters that determine the 
choice of Hilbert space (e.g., a fixed spatial volume or 
particle number) or control parameters on which the op- 
erators {G a } depend (e.g., an external field). Associated 
with these parameters {£ b } are then further variables 

n b :=d(\nZ)/dZ b . (G4) 

Taking these into account, the entropy differential reads 

dS = Y,^dg a + J2 K ^ b - ( G5 ) 

a b 

Being the only assured constant of the motion, the 
internal energy U features always as a variable in con- 
ventional thermodynamics, with the inverse temperature 
/3 as its conjugate. Depending on whether the energy is 
given on average or as a sharp constraint, the pair (U, (3) 
may be of the type (g, A) ("canonical ensemble") or (£, n) 
( "microcanonical ensemble" ), respectively. In both cases, 
defining the temperature 

T := 1/(3 (G6) 

and new variables 

w a := -A%8 , X b := -/s 6 //3, (G7) 

one obtains from Eq. (IG5I) the first law of thermodynam- 
ics, 

dU = TdS + ^2w a dg a +^2 Xbdt b . (G8) 
SQ ^ b 

sw 

Here the differential SQ denotes heat, and SW denotes 
work. Some common choices for the pairs {g, w) and 
(£, AT) are listed in Table HH 

The internal energy U is an example of a thermody- 
namic potential. Other important examples are the free 
energy 

F:=U-TS (G9) 
and the grand potential 

A:=U -TS -^2w a g a , (G10) 

a 



with respective differentials 



dF = -SdT + w a dg a + ^ x bd? (Gil) 







a b 

names 


(p,v) 




momentum, velocity 






angular momentum, angular velocity 






particle number, chemical potential 


(M,B) 


(B, —M) 


magnetic field, magnetization 


(P,E) 


(E, -P) 


electric field, electric polarization 




(V,-p) 


volume, pressure 



TABLE III. Common examples of thermodynamic variables. 
In cases where two alternative pairings are given, the proper 
choice depends on the specific situation: For instance, (M, B) 
should be used if the magnetization is an (approximate) con- 
stant of the motion and given on average, whereas (B, —M) 
should be employed if the magnetic field is an external control 
parameter for the Hamiltonian. 



and 

dA = -SdT-^2g a dw a + Y, x bd£ b - (G12) 

a b 

The latter implies, e.g., S — —(dA/dT) w> £, where the 
subscripts denote the variables to be kept fixed when 
taking the partial derivative. 

The grand potential is directly linked to the partition 
function, 

A{T,w a ,£ b ) = -T\nZ{T,w a ,i b ), (G13) 

which in turn can be calculated microscopically. A key 
part of statistical mechanics is determining the partition 
function and hence the grand potential, and subsequently 
relating the latter, via suitable coordinate transforma- 
tions on the Gibbs manifold, to the other thermodynamic 
variables of interest. 
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